perm filename DBL6.TEX[AM,DBL] blob
sn#547164 filedate 1980-11-19 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00013 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 \init{
C00004 00003 \chapbegin{6} % Here beginneth Chapter 6
C00006 00004 \sectionbegin[1]{What AM Did}
C00012 00005 \subsectionbegin[1.1]{Linear Task-by-Task Summary of a Good Run}
C00046 00006 \subsectionbegin[1.2]{Two-Dimensional Behavior Graph}
C00053 00007 \subsectionbegin[1.3]{AM as a Computer Program}
C00060 00008 \sectionbegin[2]{Experiments with AM}
C00070 00009 \subsectionbegin[2.1]{Must the Worth numbers be finely tuned?}
C00081 00010 \subsectionbegin[2.2]{How Finely Tuned is the Agenda?}
C00089 00011 \subsectionbegin[2.3]{How Valuable is Tacking Reasons onto Each Task?}
C00094 00012 \subsectionbegin[2.4]{What if Certain Concepts are Eliminated or Added?}
C00100 00013 \subsectionbegin[2.5]{Can {\AM} Work in a New Domain: Plane Geometry?}\par
C00108 ENDMK
C⊗;
\init{
\input cmpdes
\input ammacs
\def\draft{T}
\titlepage
\newpage
\setcount0 103
\parttitle{AM: DISCOVERY IN MATHEMATICS AS HEURISTIC SEARCH}
}
\baselineskip 12pt
\chapbegin{6} % Here beginneth Chapter 6
\chaptertitle{RESULTS}
\rjustline{\:B Results}
\runningrighthead{INTRODUCTION}
\tenpoint
\vskip 14pc
\epigraph{Now we have seen that mathematical work is not simply mechanical, that it
could not be done by a machine, however perfect. It is not merely a question
of applying rules, of making the most combinations possible according to certain
fixed laws. The combinations so obtained would be exceedingly numerous, useless,
and cumbersome. The true work of the inventor consists in choosing among these combinations
so as to eliminate the useless ones or rather to avoid the trouble of making them,
and the rules which must guide this choice are extremely fine and delicate. It is
almost impossible to state them precisely; they are felt rather than formulated.
Under these conditions, how imagine a sieve capable of applying them
mechanically?}
\author{Poincar\'e}
\epigraphend
\sectionbegin[1]{What AM Did}
{\AM} is both a mathematician of sorts, and a big computer program.
\listbegin
\bullistentry{By granting
{\AM} more anthropomorphic qualities than it deserves, we
can describe its progress through elementary mathematics. It
rediscovered many well-known concepts, a couple interesting but
not-generally-known ones, and several concepts which were hitherto
unknown and should have stayed that way. Section
1--5 recapped what {\AM} did, much
as a historian might critically evaluate Euler's work.
Instead of repeating any of this descriptive prose here, we now
provide a
very brief listing of what {\AM} did in a single good run, task by task.
These task-by-task listings are not
complete listings of every task
{\AM} ever attempted in any of its many runs,
but rather a trace of a single, better-than-average run of
the \ffootnote{program.}{In fact,
it is perhaps the best overall run. It occurred in two stages (due to
space problems; unimportant). In this particular run, {\AM}
misses the few ``very best'' discoveries it ever made,
since the runs they occurred in went in somewhat different directions. It
also omits some of the more boring tasks: see, \eg, the description
of task number 33. } The reader may wish to consult the brief
alphabetized glossary of concept names in the previous chapter.
Following this linear trace of {\AM}'s behavior is a more appropriate representation
of what it did: namely, a two-dimensional graph of that
same behavior as seen in ``concept-space.''}
\bullistentry{By under-estimating
{\AM}'s sophistication, one can demand answers to
the typical questions to ask about a computer program: how big is it,
how much cpu time does it use, what language it's coded in, etc. These
are
found in the third subsection (6--1.3).}
\listend
\subsectionbegin[1.1]{Linear Task-by-Task Summary of a Good Run}
\yskip
{\ninepoint \bf \parindent 0pt \parskip 1pt plus 1pt minus 0pt \setcount8 0
\hangindent 30pt \hh Fill in examples of Compose. Failed, but suggested next task:
\hangindent 30pt \hh Fill in examples of Set-union. Also failed, but suggested:
\hangindent 30pt \hh Fill in examples of Sets. Many found (\eg, by instantiating
Set.Defn) and then more derived from those examples (\eg, by running
Union.Alg).
\hangindent 30pt \hh Fill in specializations of Sets (because it was very easy to find
examples of Sets). Creation of new concepts. One, INT-Sets, is
related to ``Singletons.'' Another, ``BI-Sets,'' is all nests of braces
(no atomic elements).
\hangindent 30pt \hh Fill in examples of INT-Sets. This indirectly led to a rise in the
worth of Equal.
\hangindent 30pt \hh Check all examples of INT-Sets. All were confirmed.
{\AM} defines the set of Nonempty INT-Sets; this is renamed ``Singletons'' by the user.
\hangindent 30pt \hh Check all examples of Sets. To check a couple conjectures, {\AM}
will soon look for Bags and Osets.
\hangindent 30pt \hh Fill in examples of Bags.
\hangindent 30pt \hh Fill in specializations of Bags. Created INT-Bags (contain just one kind of
element), and BI-Bags (nests of parentheses).
\hangindent 30pt \hh Fill in examples of Osets.
\hangindent 30pt \hh Check examples of Osets.
\hangindent 30pt \hh Fill in examples of Lists.
\hangindent 30pt \hh Check examples of Lists.
\hangindent 30pt \hh Fill in examples of All-but-first.
\hangindent 30pt \hh Fill in examples of All-but-last.
\hangindent 30pt \hh Fill in specializations of All-but-last. Failed.
\hangindent 30pt \hh Fill in examples of List-union.
\hangindent 30pt \hh Fill in examples of Proj1.
\hangindent 30pt \hh Check examples of All-but-first.
\hangindent 30pt \hh Check examples of All-but-last.
\hangindent 30pt \hh Fill in examples of Proj2.
\hangindent 30pt \hh Fill in examples of Empty-structures. 4 found.
\hangindent 30pt \hh Fill in generalizations of Empty-structures. Failed.
\hangindent 30pt \hh Check examples of List-union.
\hangindent 30pt \hh Check examples of Bags. Defined Singleton-bags.
\hangindent 30pt \hh Fill in examples of Bag-union.
\hangindent 30pt \hh Check examples of Proj2.
\hangindent 30pt \hh Fill in examples of Set-union.
\hangindent 30pt \hh Check examples of Set-union. Define
$\lambda (x,y) \ x\union y=x$, later called Superset.
\hangindent 30pt \hh Fill in examples of Bag-insert.
\hangindent 30pt \hh Check examples of Bag-insert. Range is really Nonempty bags. Isolate the results
of insertion restricted to Singletons: call them Doubleton-bags.
\hangindent 30pt \hh Fill in examples of Bag-intersect.
\hangindent 30pt \hh Fill in examples of Set-insert.
\hangindent 30pt \hh Check examples of Set-insert. Range is always Nonempty sets.
Define $\lambda $ (x,S) Set-insert(x,S)=S; \ie, set membership.
Define Doubleton sets.
\hangindent 30pt \hh Fill in examples of Bag-delete.
\hangindent 30pt \hh Fill in examples of Bag-difference.
\hangindent 30pt \hh Check examples of Bag-intersect.
Define $\lambda (x,y) \ x\inter y=()$; \ie disjoint bags.
\hangindent 30pt \hh Fill in examples of Set-intersect.
\hangindent 30pt \hh Check examples of Set-intersect.
Define $\lambda (x,y) \ x\inter y=x$; \ie, subset.
Also define disjoint sets: $\lambda (x,y)\ x\inter y=\{\}$.
\hangindent 30pt \hh Fill in examples of List-intersect.
\hangindent 30pt \hh Fill in examples of Equal. Very difficult to find examples; this
led to:
\hangindent 30pt \hh Fill in generalizations of Equal. Define
``Same-size,'' ``Equal-CARs,'' and some losers.
\hangindent 30pt \hh Fill in examples of Same-size.
\hangindent 30pt \hh Apply an Algorithm for Canonize to the args Same-size and Equal.
{\AM} eventually synthesizes the canonizing function ``Size.'' {\AM} defines
the set of canonical structures: bags of T's; this later gets renamed
as ``Numbers.''
\hangindent 30pt \hh Restrict the Domain/range of Bag-union. A new operation is
defined, Number-union, with Domain/range entry <Number Number $\→$
Bag>.
\hangindent 30pt \hh Fill in examples of Number-union. Many found.
\hangindent 30pt \hh Check the Domain/range of Number-union. Range is `Number'.
This operation is renamed ``Add2.''
\hangindent 30pt \hh Restrict the Domain/range of Bag-intersect to Numbers. Renamed
``Minimum.''
\hangindent 30pt \hh Restrict the Domain/range of Bag-delete to Numbers. Renamed
``SUB1.''
\hangindent 30pt \hh Restrict the Domain/range of Bag-insert to Numbers.
{\AM} calls the new operation ``Number-insert.'' Its
Domain/range entry is <Anything Number $\→$ Bag>.
\hangindent 30pt \hh Check the Domain/range of Number-insert. This doesn't lead
anywhere.
\hangindent 30pt \hh Restrict the Domain/range of Bag-difference to Numbers. This
becomes ``Subtract.''
\eject
\hangindent 30pt \hh Fill in examples of Subtract.
This leads to defining the relation LEQ ($≤$\ffootnote{).}{If a
larger number is ``subtracted'' from a smaller, the result is zero. {\AM}
explicitly defines the set of ordered pairs of numbers having zero
``difference.'' <x,y> is in that set iff x is less than or equal to y. }
\hangindent 30pt \hh Fill in examples of LEQ. Many found.
\hangindent 30pt \hh Check examples of LEQ.
\hangindent 30pt \hh Apply algorithm of Coalesce to LEQ. LEQ(x,x) is Constant-True.
\hangindent 30pt \hh Fill in examples of Parallel-join2. Included is
Parallel-join2(Bags,Bags,Proj2), which is renamed ``TIMES,''
and
Parallel-join2(Structures,Structures,Proj1), a generalized
Union operation renamed ``G-Union,'' and a bunch of losers.
\hangindent 30pt \hh--{\669}. Fill in
and check examples of the operations just
created.
\setcount8 69
\hangindent 30pt \hh Fill in examples of Coalesce. Created: Self-Compose, Self-Insert,
Self-Delete, Self-Add, Self-Times, Self-Union, etc. Also:
Coa-repeat2, Coa-join2, etc.
\hangindent 30pt \hh Fill in examples of Self-Delete. Many found.
\hangindent 30pt \hh Check examples of Self-Delete. Self-Delete is just Identity-op.
\hangindent 30pt \hh Fill in examples of Self-member. No positive examples found.
\hangindent 30pt \hh Check examples of Self-member. Self-member is just Constant-False.
\hangindent 30pt \hh Fill in examples of Self-Add. Many found. User renames this ``Doubling.''
\hangindent 30pt \hh Check examples of Coalesce. Confirmed.
\hangindent 30pt \hh Check examples of Add2. Confirmed.
\hangindent 30pt \hh Fill in examples of Self-Times.
Many found.
Renamed ``Squaring'' by the user.
\hangindent 30pt \hh Fill in examples of Self-Compose.
Defined Squaring$\circ$Squaring.
Created Add$\circ$Add (two versions: Add2$↓1$ which is
$\lambda (x,y,z)\ (x+y)+z$, and Add2$↓2$ which is $x+(y+z)$).
Similarly, two
versions of Times$\circ$Times and of
Compose$\circ$Compose.
\hangindent 30pt \hh Fill in examples of Add2$↓1$ (\ie, $(x+y)+z$).
Many are found.
\hangindent 30pt \hh Fill in examples of Add2$↓2$ (i.e, $x+(y+z)$). Again many are found.
\hangindent 30pt \hh Check examples of Squaring. Confirmed.
\hangindent 30pt \hh Check examples of Add2$↓2$. Add2$↓1$ and Add2$↓2$ appear equivalent. But
first:
\hangindent 30pt \hh Check examples of Add2$↓1$. Add2$↓1$ and Add2$↓2$ still appear equivalent.
Merge them. So the proper argument for a generalized ``Add'' operation
is a Bag.
\hangindent 30pt \hh Apply algorithm for Invert to argument `Add'.
Define Inv-add(x) as
the set of all bags of numbers (>0) whose sum is x. Also denoted Add\inv(x).
\hangindent 30pt \hh Fill in examples of TIMES2$↓1$. (xy)z. Many are found.
\hangindent 30pt \hh Fill in examples of TIMES2$↓2$. x(yz). Again many are found.
\hangindent 30pt \hh Check examples of TIMES2$↓2$.
TIMES2$↓1$ and TIMES2$↓2$ may be equivalent.
\hangindent 30pt \hh Check examples of TIMES2$↓1$. TIMES2$↓1$ and TIMES2$↓2$ still appear
equivalent. Merge them. So the proper argument for a generalized
``TIMES'' operation is a Bag. Set up an analogy between TIMES and ADD,
because of this fact.
\hangindent 30pt \hh Apply algorithm for Invert to argument `TIMES'. Define
Inv-TIMES(x) as the set of all bags of numbers (>1) whose product is x.
Also denoted TIMES\inv. Analogic to Inv-Add.
\hangindent 30pt \hh Fill in examples of Parallel-replace2. Included are
Parallel-replace2(Bags,Bags,Proj2) (called MR2-BBP2), and many losers.
\hangindent 30pt \hh--{\6107. } Fill in and
check examples of the operations just
created.
\setcount8 107
\hangindent 30pt \hh Fill in examples of Compose.
So easy that {\AM} creates Int-Compose.
\hangindent 30pt \hh Fill in examples of Int-Compose.
The two chosen operations G,H
must be such that range(H) is a component (subset) of domain(G),
and range(G) is a component of domain(H); both G
and H must be interesting.
Create G-Union$\circ$\ffootnote{MR2-BBP2,}{An alternate
derivation of the operation of multiplication. }
Insert$\circ$Delete, Times$\circ$Squaring, etc.
\hangindent 30pt \hh--{\6127. } Fill in and check examples of the compositions just
created. Notice that G-Union$\circ$MR2-BBP2 is just
TIMES.
\setcount8 127
\hangindent 30pt \hh Fill in examples of Coa-repeat2. Among them:
Coa-repeat2(Bags-of-Numbers, Add2) [{\sl multiplication again!}],
Coa-repeat2(Bags-of-Numbers, Times) [{\sl exponen\-ti\-a\-tion}],
Coa-repeat2(Structures, Proj1) [CAR],
Coa-repeat2(Structures, Proj\-2) [{\sl Last-element-of}], etc.
\hangindent 30pt \hh Check the examples of Coa-repeat2. All confirmed.
\hangindent 30pt \hh Apply algorithms for Invert to `Doubling'.
The result is called ``Halving'' by the user.
{\AM} then defines ``Evens.''
\hangindent 30pt \hh Fill in examples of Self-Insert.
\hangindent 30pt \hh Check examples of Self-Insert. Nothing special found.
\hangindent 30pt \hh Fill in examples of Coa-repeat2-Add2.
\hangindent 30pt \hh Check examples of Coa-repeat2-Add2. It's the same as TIMES.
\hangindent 30pt \hh Apply algorithm for Invert to argument `Squaring'. Define
``Square-root.''
\hangindent 30pt \hh Fill in examples of Square-root. Some found, but very inefficiently.
\hangindent 30pt \hh Fill in new algorithms for Square-root. Had to ask user for a good one.
\hangindent 30pt \hh Check examples of Square-root. Define the set of numbers
``Perfect-squares.''
\hangindent 30pt \hh Fill in examples of Coa-repeat2-Times. This is exponentiation.
\hangindent 30pt \hh Check examples of Coa-repeat2-Times. Nothing special noticed,
unfortunately.
\hangindent 30pt \hh Fill in examples of Inv-TIMES.
Many found, but inefficiently.
\hangindent 30pt \hh Fill in new algorithms for Inv-TIMES. Obtained opaquely from the user.
\hangindent 30pt \hh Check examples of Inv-TIMES. This task suggests the next one:
\hangindent 30pt \hh Compose G-Union with Inv-TIMES. Good Domain/range. Renamed
``Divisors.''
\hangindent 30pt \hh Fill in examples of Divisors. Many found, but not very efficiently.
\hangindent 30pt \hh Fill in new algorithms for Divisors. Obtained from the user.
\hangindent 30pt \hh Fill in examples of Perfect-squares. Many found.
\hangindent 30pt \hh Fill in specializations of TIMES.
Times1(x) $=↓{df} 1\cdot x$, Times2(x) $=↓{df} 2\cdot x$,
Times-sq is TIMES with its domain restricted to bags of perfect squares, Times-ev
takes only even arguments,
Times-to-evens requires that the result be even, Times-to-sq, $\cdots $
\hangindent 30pt \hh Check examples of Divisors.
Define 0-Div, 1-Div, 2-Div, and 3-Div, the sets of numbers
whose Divisors value is the empty set, a singleton, a doubleton, and a tripleton,
respectively.
\hangindent 30pt \hh Fill in examples of 1-Div.
Only one example found: ``1.''
Lower 1-Div.Worth.
\hangindent 30pt \hh Fill in examples of 0-Div.
None found. Lower the worth of this concept.
\hangindent 30pt \hh Fill in examples of 2-Div.
A nice number of examples are found. Raise 2-Div.Worth.
\hangindent 30pt \hh Check examples of 2-Div. All confirmed,
but no pattern noticed.
\hangindent 30pt \hh Fill in examples of 3-Div. A nice number found.
\hangindent 30pt \hh Check examples of 3-Div. All confirmed. All are perfect squares.
\hangindent 30pt \hh Restrict Square-root to numbers which are in 3-Div. Call this Root3.
\hangindent 30pt \hh Fill in examples of Root3. Many found.
\hangindent 30pt \hh Check examples of Root3. All confirmed. All are in 2-Div.
Raise their worths.
\hangindent 30pt \hh Restrict Squaring to 2-divs. Call the result Square2.
\hangindent 30pt \hh Fill in examples of Square2. Many found.
\hangindent 30pt \hh Check the range of Square2. Always 3-Divs.
Conjecture: x has 2 divisors iff x$↑2$ has 3 divisors.
\hangindent 30pt \hh Restrict Squaring to 3-Divs. Call the result Square3.
\hangindent 30pt \hh Restrict Square-rooting to 2-Divs. Call the result Root2.
\hangindent 30pt \hh Fill in examples of Square3. Many found.
\hangindent 30pt \hh Compose Divisors-of and Square3. Call the result Div-Sq3.
\hangindent 30pt \hh Fill in examples of Div-Sq3. Many found.
\hangindent 30pt \hh Check examples of Div-Sq3.
All such examples are Same-size.
\hangindent 30pt \hh--{\6175. } More confirmations and explorations of the above conjecture.
Gradually, all its ramifications lead to dead-ends (as far as {\AM} is concerned).
\setcount8 175
\hangindent 30pt \hh Fill in examples of Root2. None found.
Conjecture that there are none.
\hangindent 30pt \hh Check examples of Inv-TIMES. Inv-TIMES always contains a singleton bag, and
always contains a bag of primes.
\hangindent 30pt \hh Restrict the range of Inv-TIMES to bags of primes. Call this
Prime-Times.
\hangindent 30pt \hh Restrict the range of Inv-TIMES to singletons. Called
Single-Times.
\hangindent 30pt \hh Fill in examples of Prime-times. Many found.
\hangindent 30pt \hh Check examples of Prime-times. Always
a singleton set.
User renames this conjecture {\it ``The unique factorization theorem.''}
\hangindent 30pt \hh Fill in examples of Single-TIMES. Many found.
\hangindent 30pt \hh Check examples of Single-TIMES. Always
a singleton set.
Single-TIMES is actually the same as Bag-insert!
\hangindent 30pt \hh Fill in examples of Self-set-union. Many found.
\hangindent 30pt \hh Check examples of Self-set-union. This operation is same as
Identity.
\hangindent 30pt \hh Fill in examples of Self-bag-union. Many found.
\hangindent 30pt \hh Check examples of Self-bag-union. Confirmed. Nothing interesting noticed.
\hangindent 30pt \hh Fill in examples of Inv-ADD.
\hangindent 30pt \hh Check examples of Inv-ADD. Hordes of boring conjectures, so:
\hangindent 30pt \hh Restrict the domain of Inv-ADD to primes (Inv-Add-primes), to evens
(Inv-Add-evens), to squares, etc.
\hangindent 30pt \hh Fill in examples of Inv-add-primes. Many found.
\hangindent 30pt \hh Check examples of Inv-add-primes. Confirmed, but nothing special noticed.
\hangindent 30pt \hh Fill in examples of Inv-add-evens. Many found.
\hangindent 30pt \hh Check examples of Inv-add-evens. Always contains
a bag of primes.
\hangindent 30pt \hh Restrict the range of Inv-Add-evens to bags of primes. Called
Prime-ADD.
\hangindent 30pt \hh Restrict the range of Inv-ADD to singletons. Call that new operation
Single-ADD.
\hangindent 30pt \hh Fill in examples of Prime-ADD. Many found.
\hangindent 30pt \hh Check examples of Prime-ADD. Always
a nonempty set (of bags of primes).
User renames this conjecture {\it ``Goldbach's conjecture.''}
\hangindent 30pt \hh Fill in examples of Single-ADD. Many found.
\hangindent 30pt \hh Check examples of Single-ADD. Always
a singleton set. This operation is the same as
Bag-insert and Single-TIMES.
\hangindent 30pt \hh Restrict the range of Prime-ADD to singletons, by analogy to
\ffootnote{Prime-TIMES.}{In this case, {\AM} is asking which numbers are uniquely representable
as the sum of two primes.} Call the new operation Prime-ADD-SING.
\hangindent 30pt \hh Fill in examples of Prime-ADD-SING. Many found.
\hangindent 30pt \hh Check examples of Prime-ADD-SING. Nothing special noticed.
\hangindent 30pt \hh Fill in examples of \footnote{Times-sq.}{Recall that this is just TIMES restricted
to operate on perfect squares. } Many examples found.
\hangindent 30pt \hh Check Domain/range of Times-sq. Is the range actually Perfect-squares?
Yes!
\hangindent 30pt \hh Fill in examples of Times1.
Recall that Times1(x) $=↓{df} $ TIMES(1,x) = 1$\cdot $x = x.
\hangindent 30pt \hh Check examples of Times1.
Apparently just a restriction of Identity.
\hangindent 30pt \hh Check examples of Times-sq. Confirmed.
\hangindent 30pt \hh Fill in examples of Times0.
\hangindent 30pt \hh Fill in examples of Times2.
\hangindent 30pt \hh Check examples of Times2. Apparently the same as Doubling.
That is, $x+x=2\cdot x$. Very important.
By analogy, define Ad2(x) as x+2.
\hangindent 30pt \hh Fill in examples of Ad2.
\hangindent 30pt \hh Check examples of Ad2. Nothing interesting noticed.
\hangindent 30pt \hh Fill in specializations of Add. Among those created are:
Add0 (x+0), Add1, Add3, ADD-sq (addition restricted to perfect squares),
Add-ev (sum of even numbers), Add-pr (sum of primes), etc.
\hangindent 30pt \hh Check examples of Times0. The value always seems to be 0.
\hangindent 30pt \hh Fill in examples of \footnote{Times-ev.}{Recall
that Times-ev is just like
TIMES restricted to operating on even numbers. } Many examples found.
\hangindent 30pt \hh Check examples of Times-ev. Apparently all the results are Evens.
\hangindent 30pt \hh Fill in examples of \footnote{Times-to-ev.}{That is, consider bags of numbers
which multiply to give an even number. } Many found.
\hangindent 30pt \hh Fill in examples of Times-to-sq. Only a few found.
\hangindent 30pt \hh Check examples of Times-to-sq. All arguments always seem to be squares.
Conjec: Times-to-sq is really the same as Times-sq. Merge the two.
This is a false conjecture, but did {\AM} no harm.
\hangindent 30pt \hh Check examples of Times-to-ev. The domain always contains an even number.
\hangindent 30pt \hh Fill in examples of Self-Union.
\hangindent 30pt \hh Check examples of Self-Union.
\hangindent 30pt \hh Fill in examples of SubSet.
\hangindent 30pt \hh Check example of SubSet.
\hangindent 30pt \hh Fill in examples of SuperSet.
\hangindent 30pt \hh Check examples of SuperSet.
Conjec: Subset(x,y) iff Superset(y,x). Important.
\hangindent 30pt \hh Fill in examples of Compose$\circ$Compose-1. {\AM} creates
some explosive combination (\eg,
(Compose$\circ$Compose)$\circ$(Compose$\circ$Compose)$\circ$(Compose$\circ$Compose)),
some poor ones (\eg, Square$\circ$Count$\circ$ADD\inv),
and even a few---very few---winners (\eg,
SUB1$\circ$Count$\circ$Self-Insert, which is later recognized to coincide with
Count).
\hangindent 30pt \hh Check examples of Compose$\circ$Compose-1.
\hangindent 30pt \hh Fill in examples of Compose$\circ$\ffootnote{Compose-2.}{Recall that the difference
between this operation and the last one is merely in the order of the composing:
F$\circ$(G$\circ$H) versus (F$\circ$G)$\circ$H. }
{\AM} recreates many of the previous tasks' operations.
\hangindent 30pt \hh Check examples of Compose$\circ$Compose-2.
Nothing noticed \footnote{yet.}{Later on,
{\AM} will use these new operations to discover the associativity of Compose. }
\hangindent 30pt \hh--{\6252. } Fill in and check examples of the losing compositions just
created.
\setcount8 252
\hangindent 30pt \hh Fill in examples of Add-sq (\ie, sum of squares).
\hangindent 30pt \hh Check Domain/range entries of Add-sq. The range is not always perfect squares.
Define Add-sq-sq(x,y),
which is True iff x and y are perfect squares and their sum is a perfect
square as well.
\hangindent 30pt \hh Fill in examples of Add-pr; \ie, addition of primes.
\hangindent 30pt \hh Check Domain/range entries of Add-pr. {\AM} defines the set of pairs of primes
whose sum is also a prime. This is a bizarre derivation of prime pairs.
\tenpoint \rm \parindent 20pt \parskip 0pt }
\subsectionbegin[1.2]{Two-Dimensional Behavior Graph}
On the next two pages is a graph of the same ``best run'' which {\AM} executed.
The nodes are concepts, and the links are actions which {\AM} performed.
Labels on the links indicate when each action was taken, so the reader
may observe how {\AM} jumped around. It should also be
easy to perceive from the graph
which paths of development were abandoned, which concepts ignored, and which ones
concentrated upon. These are precisely the features of {\AM}'s behavior which are
awkward to infer from a simple linear trace (as in the previous section).
In more detail, here is how to read the graph:
Each node is a concept.
To save space, these names are often highly abbreviated. For example,
``x0'' is used in place of ``TIMES-0.''
Each concept name is surrounded by from zero to four numbers:
\han6{\6 318 \hskip 25pt 288}
\han6{\bf \ WIDGETING}
\han6{\6 310 \hskip 25pt 291}
\noindent The upper right number indicates the task number (see last section) during which
examples of this concept were filled in. The lower right number tells when they
were checked.
The upper left number indicates when the Domain/range facet of that
concept was modified. Finally, the lower left number is the task number during
which some new Algorithms for that concept were obtained.
A number in parentheses indicates that the task with that number was a total
failure.
Because of the limited space, it was decided that if a concept were ever
renamed by the user, then only that newer, mnemonic name would be given in
the diagram. Thus there is an arrow from ``Coalesce'' to ``\4Square\1,'' an operation
originally called ``Self-Times'' by {\AM}.
Sometimes, a concept will have under it a note of the form \6$≡$
GROK\1.
This simply means that {\AM} eventually discovered that the concept was
equivalent to the already-known concept ``GROK,'' and probably forgot about
this one (merged it into the one it already knew about).
The ``trail'' of discovery may pick up again at that preexisting concept.
A node written as \6=GROK\0 means that the concept was really the same as
``GROK,'' but {\AM} never investigated it enough to notice this.
The arrows indicate the creation of new concepts. Thus an arrow leading
to concept ``Widgeting'' indicates how that concept was created. An arrow directed
away from Widgeting
points to a concept created as, \eg, a specialization or an
example of Widgeting. No arrowheads are
in practice necessary:
all arrows are directed \4downward\1.
The arrows may be labeled, indicating precisely what they represent (\eg,
composition, restriction) and what the task number was when they occurred.
For space reasons, the following convention has proven necessary:
if an arrow emanating from C is unnumbered,
it is assumed to have occurred at the same time
as the arrow to its immediate left which also points from C;
if all the arrows
emanating from C have no number, than all their times of occurrence are
assumed to be the \ffootnote{{\sl lower right}}{This is often
true because many concepts are created
while checking examples of some known concept.}
number of C.
Finally, if C has no lower right number, the arrow is assumed to have the
value of the upper right number of C.
An unlabeled arrow is assumed to be an act of
Specialization or the creation of
an \footnote{Example.}{It
should be clear in each context which is happening. If not, refer to the
short trace in the preceding section, and look up the appropriate task number.}
Labels, when they do occur, are given in capitals and small letters;
concept names (nodes) are by contrast in all capitals.
All the numbers correspond to those given to the tasks
in the task-by-task traces presented
in the last section.
The first part of this graph (presented below) contains static structural
(and ultimately numerical) concepts which were studied by {\AM}:
\vfill
TO THE PRINTER: PLEASE PASTE THE SMALL GRAPH FIGURE IN HERE!
\vfill
The rest of the graph (presented on the next page) deals with activities which were
investigated:
\newpage
\vfill
TO THE PRINTER: PLEASE PASTE THE LARGE GRAPH FIGURE IN HERE!
\vfill
\eject
\subsectionbegin[1.3]{AM as a Computer Program}
When viewed as a large LISP program, there is very little of interest about
{\AM}. There are the usual battery of customized functions (\eg, a conditional
PRINT function), the storage hacks (special emergency garbage collection
routines, which know which facets are expendible), the time hacks
(omnisciently arrange clauses in a conjunction so that the one most likely
to fail will come first), and the bugs (if the user renames a concept
while it's the current one being worked on,
there is a 5\%\ chance of {\AM} entering an infinite loop).
Below are listed a few parameters of the system, although I doubt that
they hold any theoretical significance. The reader may be curious about how
big {\AM} is, how long it takes to execute, etc.
Machine: SUMEX, PDP-10, KI-10 uniprocessor, 256k core memory.
Language: Interlisp, January 1975 release,
which occupies 140k of the total 256k,
but which provides a surplus ``shadow space''
of 256k additional words available for holding compiled code.
{\AM} support code:
200 compiled (not block-compiled) utility routines, control routines, etc.
They occupy roughly 100k, but all are pushed into the shadow space.
{\AM} itself: 115 concepts, each occupying about .7k
(about two typed pages, when Pretty-printed with indentation).
Facet/entries stored as
property/value on the property list of atoms whose names are concepts'
\ffootnote{names.}{Snazzy feature:
Executable entries on facets (\eg, an entry on Union.Alg) are stored uncompiled
until the first time they are actually called on, at which time they are compiled
and then executed. }
Each concept has about 8 facets filled in.
Heuristics are tacked onto the facets of the concepts. The more general
the concept, the more heuristic rules it has attached to \footnote{it.}{This
was not done consciously,
and may or may not hold some theoretical significance. }
``Any-concept'' has 121 rules; ``Active concept'' has 24; ``Coalesce'' has 7;
``Set-Insertion'' has none. There are 250 heuristic rules in all, divided into
four flavors (Fillin, Check, Suggest, Interestingness).
Although the mean number of rules is therefore only about 2.2 (\ie,
less than 1 of each flavor) per
concept,
the standard deviation of this is a whopping 127.4. The average number of heuristics
(of a given flavor)
encountered rippling upward from a randomly-chosen concept C (along the
network of generalization links) is about 35, even though the mean path length
is only about \footnote{4.}{If the heuristics were homogeneously distributed among the
concepts, the number of heuristics (of a given type) along a typical path
of length 4 would only be about 2, not 35. If all the heuristics were tacked
onto Anything and Any-concept, the number encountered in any path would be
75, not 35. }
The total number of jobs executed in a typical run (from scratch) is about 200.
The run ends because of space problems, but {\AM}'s performance begins to degrade
near the end anyway.
``Final'' state of {\AM}: 300 concepts, each occupying about 1k. Many are swapped out
onto disk.
Number of winning concepts discovered: 25 (estimated).
Number of acceptable concepts defined: 100 \ffootnote{(est.).}{For a list of
``winners'' and ``acceptables,'' see the final section in Appendix 1. }
Number of losing concepts unfortunately worked on: 60 (est.).
The original 115 concepts have grown to an average size of 2k.
Each concept has about 11 facets filled in.
About 30 seconds of cpu time were allocated to each task, on the average,
but the task typically used only about 18 seconds before quitting.
Total CPU time for a run is about 1 hour. Total cpu time consumed by this
research project was about 500 cpu hours.
Real time: about 1 minute per task, 2 hours per run.
The idea for {\AM} was formulated in the fall of 1974, and {\AM} was coded in the
summer of 1975.
Total time consumed
by this project to date has been about 2600 man-hours: 700 for planning,
500 for coding, 600 for modifying and debugging and experimenting,
700 for writing the thesis, and 100 for editing it into book form.
{\AM} was working by itself:
it received no help from the user, and all its
concepts' Intuitions facets had been removed.
\sectionskip
\sectionbegin[2]{Experiments with AM}
One valuable aspect of {\AM} is that it is amenable to many kind of
interesting experiments. Although {\AM} is too {\it ad hoc} for numerical
results to have much significance, the qualitative results perhaps do
have some valid things to say about research in elementary
mathematics, about automating research, and at least about the
efficacy of various parts of {\AM}'s design.
This section will explain what it means to perform an experiment on
{\AM}, what kinds of experiments are imaginable, which of those are
feasible, and finally will describe the many experiments which were
performed on {\AM}.
By modifying {\AM} in various ways, its behavior can be altered, and the
\4quality\0 of its behavior will change as well. As a drastic
example, one experiment involved forcing {\AM} to select the next task to work on
\4randomly\0 from the agenda, not the top task each time. Needless to say,
the performance was very different from usual.
By careful planning, each experiment can tell us something new about
{\AM}: how valuable a certain piece of it is, how robust a certain
scheme really is, etc. The results of these experiments would then
have something to contribute to a discussion of the ``real
intelligence'' of {\AM} (\eg, what features were superfluous), and
contribute to the design of the ``next'' {\AM}-like system. Generalizing
from those results, one might suggest some hypotheses about the
larger task of automated math research.
Let's cover the different \4kinds\0 of experiments one could perform
on {\AM}:
\listbegin
\numlistentry[1]{Remove individual concept modules, and/or individual heuristic
rules. Then examine how {\AM}'s performance is degraded. {\AM} should
operate even if most of its heuristic rules and most of its concept
modules were excised.
If the remaining fragment of {\AM} is too small, however,
it may not be able to find anything interesting to do. In fact, this
situation was actually encountered experimentally, when the first few
partially complete concepts were inserted. If only a little bit of {\AM}
is removed, the remainder will in fact keep operating without this
``uninteresting collapse.'' The converse situation should also hold:
although still functional with any concept module unplugged, {\AM}'s
performance \4should\0 be noticeably degraded. That is, while not
indispensable, each concept should nontrivially help the others. The
same holds for each individual heuristic rule.
When a piece of {\AM} is removed,
which concepts does
{\AM} then ``miss'' discovering? Is the removed concept/heuristic later
discovered anyway by those which are left in {\AM}? This should
indicate the importance of each kind of concept and rule with which {\AM}
starts.}
\numlistentry[2]{Vary the relative weights given to features by the criteria
which judge aesthetics, interestingness, worth, utility, etc. See
how important each factor is in directing {\AM} along successful routes.
In other words, vary the little numbers in the formulae (both the
global priority-assigning formula and the local
reason-rating
ones inside heuristic rules). One
important result will be some idea of the robustness or ``toughness''
of the numeric weighting factors. If the system easily collapses, it
was too finely tuned initially.}
\numlistentry[3]{Add several new concept modules (including new heuristics
relevant to them) and
see if {\AM} can work in some unanticipated field of mathematics (like
graph theory or calculus or plane geometry).
Do earlier achievements---concepts and conjectures {\AM}
synthesized already---have any
impact in the new domain? Are some specialized heuristics from the
first domain totally wrong here? Do all the old general heuristics
still hold here? Are they sufficient, or are some ``general''
heuristics needed here which weren't needed before? Does {\AM} ``slow
down'' as more and more concepts get introduced?}
\numlistentry[4]{Try to have {\AM}
develop nonmathematical theories (like elementary physics, or program
verification). This might require limiting {\AM}'s freedom to
``ignore a given body of data and move on to something more
interesting.'' The exploration of very non-formalizable fields (\eg,
politics) might require much more than a small augmentation of {\AM}'s
base of concepts. For some such domains, the ``Intuitions'' scheme, which had to
be abandoned for math, might prove valid and valuable.}
\numlistentry[5]{Add several new concepts dealing with proof, and of course add
all the associated heuristic rules. Such rules would advise {\AM} on the
fine points of using various techniques of proof/disproof: when to
use them, what to try next based on why the last attempt failed, etc.
See if the \4kinds\0 of discoveries {\AM} makes are increased.}
\listend
Several experiments (of types 1, 2, and 3 above) were
set up and performed on {\AM}. We're now ready to examine each of them
in detail. The following points are covered for each experiment:
\listbegin
\numlistentry[1]{How was it thought of?}
\numlistentry[2]{What will be gained by it? What would be the implications of the
various possible outcomes?}
\numlistentry[3]{How was the experiment set up? What preparations/modifications had
to be made? How much time (man-hours) did it take?}
\numlistentry[4]{What happened? How did {\AM}'s behavior change? Was this expected?
Analysis.}
\numlistentry[5]{What was learned from this experiment? Can we conclude anything
which suggests new experiments (\eg, use a better machine, a new
domain) or which bears on a more general problem that {\AM} faced (\eg,
a new way to teach math, a new idea about doing math research)?}
\listend
\subsectionbegin[2.1]{Must the Worth numbers be finely tuned?}
Each of the 115 initial concepts
has supplied to it (by the author) a number between 0
and 1000, stored as its Worth facet, which is supposed to represent
the overall value of the concept. ``Compose'' has a higher initial
Worth than ``Structure-delete,'' which is higher than \ffootnote{``Equality''.}{As {\AM}
progresses, it notices something interesting about Equality every now
and then, and pushes its Worth value upwards.}
Frequently, the priority of a task involving C depends on the overall
Worth of C.
How sensitive is
{\AM}'s behavior to the initial settings of the Worth facets? How
finely tuned must these initial Worth values be?
This experiment was thought of because of the ``brittleness'' of many other
AI systems, the amount of fine tuning needed to elicit coherent behavior.
For example, see the discussion of limitations of PUP6, in Lenat[75b].
The author believed that {\AM} was very resilient in this regard, and that
a demonstration of that fact would increase credibility in the power of the ideas
which {\AM} embodies.
To test this, a simple experiment was performed. Just before starting
{\AM}, the mean value of all concepts' Worth values was computed. It turned out to be
roughly 200. Then each concept had its Worth reset to the value \footnote{200.}{The
initial spread of values had previously been from 100 to 600. }
This
was done ``by hand,'' by the author, in a matter of seconds. {\AM} was
then started and run as if there were nothing amiss, and its behavior
was watched carefully.
What happened? By and large, the same major
discoveries were made---and missed---as usual,
in the same order as usual. But whereas {\AM}
proceeded fairly smoothly before, with little superfluous activity, it
now wandered quite blindly for long periods of time, especially at
the very beginning. Once {\AM} ``hooked into'' a line of productive
development, it followed it just as always, with no noticeable
additional wanderings. As one of these lines of developments died
out, {\AM} would wander around again, until the next one was begun.
It took roughly three times as long for each major discovery to occur
as normal. This ``delay'' got shorter and shorter as {\AM} developed
further. In each case, the tasks preceding the discovery and
following it were pretty much the same as normal; only the tasks
``between'' two periods of development were different---and much more
numerous.
The reader may be interested to learn that the Worth values of many
of the concepts---and most of the new concepts---ended up very
close to the same values that they achieved in the original run.
Overrated concepts were investigated and proved boring; underrated
concepts had to wait longer for their chances, but then quickly proved
interesting and had their Worth facets boosted.
The conclusion I draw from this change in behavior is that the Worth
facets are useful for making blind decisions---where {\AM} must choose
based only on the overall worths of the various concepts in its
repertoire. Whenever a specific reason existed, it was far more
influential than the ``erroneous'' Worth values. The close, blind,
random decisions occur between long bursts of specific-reason-driven
periods of creative \footnote{work.}{Incidentally, GPS behaved just this same way.
See, \eg, Newell&Simon[72]. }
The general answer, then, is \4No\1, the initial settings of the Worth
values are not crucial. Guessing reasonable initial values for them is
merely a time-saving device.
This suggests
an interesting research problem: what impact does
the \4quality\0 of initial starting values have on humans? Give several bright
undergraduate math majors the same set of objects and operators to play with,
but tell some of them ({\it i\/}) nothing, and some of them ({\it ii\/}) a certain few pieces of
the system are very promising, then ({\it iii\/})
emphasize a different subset of the objects and operators.
How does ``misinformation'' impede the humans? How about no information?
Have them give verbal protocols about where they are focussing their attention,
and why.
Albeit at a nontrivial cost, the Worth facets did manage to correct
themselves by the end of a \footnote{long}{A couple cpu hours, about a
thousand tasks total selected from the agenda.} run. What would
happen if the Worth facets of those 115 concepts were not only
initialized to 200, but were held fixed at 200 for the duration of
the run?
In this case, the delay still subsided with time. That is, {\AM}
still got more and more ``back to normal'' as it progressed onward. The reason
is because {\AM}'s later work dealt with concepts like Primes,
Square-root, etc., which were so far removed from the initial base of
concepts that the initial concepts' Worths were of little consequence.
Even more drastically, we could force all the Worth facets of all
concepts---even newly created ones---to be kept at the value 200
forever. In this case, {\AM}'s behavior doesn't completely disintegrate,
but that delay factor actually increases with time: apparently, {\AM}
begins to suffer from the exponential growth of ``things to do'' as its
repertoire of concepts grows linearly. Its purposiveness, its
directionality depends on ``focus of attention'' more and more, and if
that feature is removed, {\AM} loses much of its rationality. A factor
of 5 delay doesn't sound that bad ``efficiency-wise,'' but the actual
apparent behavior of {\AM} is as staccato bursts of development,
followed by wild leaps to unrelated concepts. {\AM} no longer can
``permanently'' record its interest in a certain concept.
So we conclude that the Worth facets are ({\it i\/}) not finely tuned, yet
({\it ii\/}) provide important global information about the relative values
of concepts. If the Worth facets are completely disabled, the
rationality of {\AM}'s behavior hangs on the slender thread of ``focus of
attention.''
\subsectionbegin[2.2]{How Finely Tuned is the Agenda?}
The top few candidates on the agenda always appear to be reasonable
(to me). If I work with the system, guiding it, I can cause it to
make a few discoveries it wouldn't otherwise make, and I can cause it
to make its typical ones much faster (by about a factor of two). Thus the
\4very\0 top task is not always the best.
If {\AM} randomly selects one of the top 20 or so tasks on the agenda
each time, what will happen to its behavior? Will it disintegrate,
slow down by a factor of ten, slow down slightly, etc.
This experiment required only a few seconds to set up, but demanded a
familiarity with the LISP functions which make up {\AM}'s control
structure. At a certain point, {\AM} asks for Best-task(Agenda).
Typically, the LISP function Best-task is defined as CAR---\ie,
pick the first member from the list of tasks. What I did was to
redefine Best-task as a function which randomly selected n from the
set $\{1,2,\cdots ,20\}$, and then returned the n$↑{th}$ member of the
job-list.
If you watch the top job on the agenda, it will take about 10 cycles
until {\AM} chooses it. And yet there are many good, interesting,
worthwhile jobs sprinkled among the top 20 on the agenda, so {\AM}'s
performance is cut by merely a factor of three, as far as cpu time per
given major discovery. Part of this better-than-20 behavior is due
to the fact that the eighteenth best task had a much lower priority
rating than the top few; hence it was allocated much less cpu time for
its quantum than the top task would have received. Whether it
succeeded or failed, it used up very little time. Since {\AM} was
frequently working on a low-value task, it was unwilling to spend
much time or space on it. So the mean time allotted per task fell to
about 15 seconds (from the typical 30 secs). Thus, the ``losers'' were
dealt with quickly, so the detriment to cpu-time performance was
softened.
To carry this investigation further, another experiment was carried
out. {\AM} was forced to alternate between choosing the top task on the
agenda, and a randomly chosen one. Although its rate of discovery
was cut by less than half, its behavior was almost as distasteful to
the user as in the last (always-random) experiment.
{\bf Conclusion:} Selecting (on the average) the tenth-best candidate
impedes pro\-gress by a factor of less than ten (about a factor of three), but
it dramatically degrades the ``sensibleness'' of {\AM}'s behavior, the
continuity of its actions. Humans place a big value on absolute
sensibleness, and believe that doing something silly 50\%\ of the time
is \4much\0 worse than half as productive as always doing the
next most logical task.
{\bf Corollary:} Having 20 multi-processors simultaneously execute the top
20 jobs will increase the rate of ``big'' discoveries,
but not by a
full factor of twenty---more like a factor of three.
Another experiment in this same vein was done, one which was designed
to be far more crippling to {\AM}. Be-threshhold was held at 0 always,
so \4any\0 task which ever got proposed was kept forever on the
agenda, no matter how low its priority. The Best-task function was
modified so it randomly selected any member of the list of jobs. As a
final insult, the Worth facets of all the concepts were initialized
to 200 before starting {\AM}.
{\bf Result:} Many ``explosive'' tasks were chosen, and the number of new
concepts increased rapidly. As expected, most of these were real
``losers.'' There seemed no rationality to {\AM}'s sequence of actions,
and it was quite boring to watch it floundering so. The typical
length of the agenda was about 500, and {\AM}'s performance was ``slowed''
by at least a couple orders of magnitude. A more subjective measure
of its ``intelligence'' would say that it totally collapsed under this
random scheme.
{\2Conclusion:\0} Having an unlimited number of processors
simultaneously execute all the jobs on the agenda would increase the
rate at which {\AM} made big discoveries, at an ever-accelerating pace
(since the length of the agenda would grow exponentially).
Having a uniprocessor \4simulate\0 such parallel processing would be
a losing idea, however. The truly ``intelligent'' behavior {\AM} exhibits
is its plausible sequencing of tasks.
\subsectionbegin[2.3]{How Valuable is Tacking Reasons onto Each Task?}
Let's dig inside the agenda scheme now. One idea I've repeatedly emphash∂ed
is the attaching of reasons to the tasks on the agenda,
and using those reasons and their ratings to compute the overall
priority value assigned to each task. An experiment was done to
ascertain the amount of intelligence that was emanating from that
idea.
The global formula assigning a priority value to each job was
modified. We let it still be a function of the reasons for the job,
but we ``trivialized'' it: the priority of a job was computed as simply
the number of reasons it has (normalized by multiplying by 100, and
cut-off if over 1000).
This raised the new question of what to do if several jobs all have
the same priority. In that case, I had {\AM} execute them in
stack-order (most recent job tackled \ffootnote{first).}{Why? Because
({\it i\/}) it sounds right
intuitively to me, ({\it ii\/}) this is akin to human focus of attention, and
mainly because ({\it iii\/}) this is what {\AM} did anyway, with no extra
modification. }
{\bf Result:} I secretly expected that this wouldn't make too much
difference on {\AM}'s apparent level of directionality, but such was
definitely not the case. While {\AM} started by doing tasks which were
far more interesting and daring than usual (\eg, filling in various
Coalescings right away), it soon became obvious that {\AM} was being
swayed by hitherto trivial coding decisions. Whole classes of
tasks---like Checking Examples of C---were never chosen, because they
only had one or two reasons supporting them. Previously, one or two
good reasons were sufficient. Now, tasks with several poor reasons
were rising to the top and being worked on. Even the LIFO (stack)
policy for resolving ties didn't keep {\AM}'s attention focussed.
\2Conclusion:\0 Unless a conscious effort is made to ensure that
each reason really will carry roughly an equal amount of semantic
impact (charge, weight), it is not acceptable merely to choose tasks
on the basis of how many reasons they possess. Even in those
constricted equal-weight cases, the similarities between reasons
supporting a task should be taken into account.
\subsectionbegin[2.4]{What If Certain Concepts Are Eliminated or Added?}
Feeling in a reckless mood one day, I eliminated the concept
``Equality'' from {\AM} to see what it would then do. Equality was a key
concept, because {\AM} discovered Numbers via the technique of
generalizing the relation ``Equality'' (exact equality of 2 given
structures, at all internal levels). What would happen if we
eliminate this path? Will {\AM} rederive Equality? Will it get to
Cardinality via another route? Will it do some set-theoretic things?
{\bf Result:} Rather disappointing. {\AM} never did rederive Equality or
Car\-di\-na\-li\-ty. It spent its time thrashing about with various flavors
of data-structures (unordered \vs\ ordered, multiple-elements allowed
or not, etc.), deriving large quantities of boring results about
them. Very many composings and coalescings were done, but no exciting
new operations were produced.
A kinder type of experiment would be to \4add\0 a few concepts.
One such experiment was done: the addition of Cartesian-product.
This operation, named C-PROD, accepts two sets as arguments and
returns a third set as its value: the Cartesian product of the first
two.
{\bf Result:} The only significant change in {\AM}'s behavior was that TIMES
was discovered first as the restriction of C-PROD to Canonical-Bags.
When it soon was rediscovered in a few other guises, its Worth was
even higher than usual. {\AM} spent even more time exploring concepts
concerned with it, and deviated much less for quite a long time.
{\bf Synthesis of the above experiments:} It appears that {\AM} may really be
more specialized than expected; {\AM} may be able to forge ahead
only along one or two
main lines of development---at least if we demand it make very
interesting, well-known discoveries quite frequently. Removing
certain key concepts can be disastrous. On the other hand, adding
some carefully chosen new ones can greatly enhance {\AM}'s
directionality (hence its apparent intelligence).
{\bf Conclusion:} In its current state,
{\AM} is thus seen to be \4minimally competent:\0 if any
knowledge is removed, it appears much less intelligent; if any is added, it
appears slightly smarter.
{\bf Suggestion for future research:}
A hypothesis, which should be tested experimentally, is that the importance
of the presence of each individual
concept decreases as the number of---and \4depth\0 of---the
synthesized concepts increases.
That is, any excision would eventually ``heal over,'' given enough time.
The failure of {\AM} to verify this may be due to the relatively small amount of
development in toto (an hour of cpu time, a couple hundred new concepts, a few
levels deeper than the starting ones).
\subsectionbegin[2.5]{Can AM Work in a New Domain: Plane Geometry?}\par
\epigraph{A true strategy should be domain-independent.}
\author{Adams}
\epigraphend
\noindent As McDermott points out [McDermott 76], just labelling a
bunch of heuristics ``Operation heuristics'' doesn't suddenly make them
relevant to any operation; all it does is give that impression to a
human who looks at the code (or a description of it). Since the
author hoped that the labelling really was fair, an experiment was
done to test this. Such an experiment would be a key determiner of
how general {\AM} is.
How might one demonstrate that the ``Operation'' heuristics really
could be useful or dealing with any operation, not just the ones
already in {\AM}'s initial base of concepts?
One way would be to pick a new domain, and see how many old
heuristics contribute to---and how many new heuristics have to be
added to elicit---some sophisticated behavior in that domain. Of
course, some new primitive concepts would have to be introduced
(defined) to {\AM}.
Only one experiment of this type was attempted. The author added a
new base of concepts to the ones already in {\AM}. Included were:
Point, Line, Angle, Triangle, Equality of points/lines/angles/triangles.
These simple plane geometry notions were sufficiently removed from
set-theoretic ones that those preexisting specific concepts would be
totally irrelevant; on the other hand, the general concepts---the
ones with the heuristics attached---would still be just as relevant:
Any-concept, Operation, Predicate, Structure, etc.
For each new geometric concept, the only facet filled in was its
Definition. For the new predicates and operators, their Domain/range
entries were also supplied.
No new heuristics were added to {\AM}.
{\bf Result:} Fairly good behavior. {\AM} was able to find examples of all
the concepts defined, and to use the character of the results of
those examples searches to determine intelligent courses of action.
{\AM} derived congruence and similarity of triangles, and several other
well-known simple concepts. An unusual result was the repeated
derivation of the idea of ``timberline.'' This is a predicate on two
triangles: Timberline(T1,T2) iff T1 and T2 have a common angle, and
the side opposite that angle in the two triangles are parallel:
\vfill
\ctrline{(Timberline Diagram)}
\eject
%\vskip 3in % Insert the timberline diagram here.
Since {\AM} kept rederiving this in new ways, it seems surprising that
there is no very common name for the concept. It could be that {\AM} is
using techniques which humans don't---at least, for geometry.
The only new bit of knowledge that came out of this experiment was a
``use'' for Goldbach's conjecture: any angle (0--180$\deg$) can be
built up (to within 1$\deg$) as the sum of two angles of prime
degrees (<180$\deg$). This result is admittedly esoteric at best, but is
nonetheless worth reporting.
The total effort expended on this experiment was: a few months of
subconscious processing, ten hours of designing the base of concepts
to insert, ten hours inserting and debugging them. The whole task
took about two days of real time.
The conclusion to be drawn is that heuristics really can be generally
useful; their attachment to general-sounding concepts is not an
\ffootnote{illusion.}{Or it's a very good illusion! But note: If this phenomenon is repeatable
and useful, then (like Newtonian mechanics) it won't pragmatically
matter whether it's only
an approximation to reality.}
The implication of this is that {\AM} can be grown
incrementally, domain by domain. Adding expertise in a new domain
requires only the introduction of concepts local to that domain; all
the very general concepts---and their heuristics---already exist
and can be used with no change.
The author feels that this result can be generalized: {\AM} can be
expanded in scope, even to nonmathematical fields of endeavor. In
each field, however, the rankings of the various \footnote{heuristics}{The
numeric values that
should be returned by
the local ratings formulae which are attached to the
heuristic rules. } may shift slightly. As the domain
gets further away from mathematics, various heuristics are important
which were ignorable before (\eg, those dealing with ethics), and
some pure math research-oriented heuristics become less applicable
(``giving up and moving on to another topic'' is not an acceptable
response to the 15-puzzle, nor to a hostage situation).
Well, it sounds as if we've shifted our orientation from ``Results'' to
a subjective evaluation of those results. Let's start a new chapter
to legitimize this type of commentary.
\worldend